NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Li, Jiachen; Wang, Xinyao; Zhu, Sijie; Kuo, Chia-Wen; Xu, Lu; Chen, Fan; Jain, Jitesh; Shi, Humphrey; Wen, Longyin (December 2024, NeurIPS 2024)

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of efficiently improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Experts (MoE) in LLMs, which improves model scalability during training while keeping inference costs similar to those of smaller models, we propose CuMo, which incorporates Co-upcycled Top-K sparsely-gated Mixtureof-experts blocks into both the vision encoder and the MLP connector, thereby enhancing the multimodal LLMs with neglectable additional activated parameters during inference. CuMo first pre-trains the MLP blocks and then initializes each expert in the MoE block from the pre-trained MLP block during the visual instruction tuning stage, with auxiliary losses to ensure a balanced loading of experts. CuMo outperforms state-of-the-art multimodal LLMs across various VQA and visual-instruction-following benchmarks within each model size group, all while training exclusively on open-sourced datasets.
more » « less
Full Text Available
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Li, Jiachen; Wang, Xinyao; Zhu, Sijie; Kuo, Chia-Wen; Xu, Lu; Chen, Fan; Jain, Jitesh; Shi, Humphrey; Wen, Longyin (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Full Text Available
Achieving sustainability of greenhouses by integrating stable semi-transparent organic photovoltaics

https://doi.org/10.1038/s41893-023-01071-2

Zhao, Yepin; Li, Zongqi; Deger, Caner; Wang, Minhuan; Peric, Miroslav; Yin, Yanfeng; Meng, Dong; Yang, Wenxin; Wang, Xinyao; Xing, Qiyu; et al (May 2023, Nature Sustainability)

Full Text Available
Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression

https://doi.org/10.1109/ICCV.2019.00707

Wang, Xinyao; Bo, Liefeng; Fuxin, Li (October 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV))

Heatmap regression with a deep network has become one of the mainstream approaches to localize facial landmarks. However, the loss function for heatmap regression is rarely studied. In this paper, we analyze the ideal loss function properties for heatmap regression in face alignment problems. Then we propose a novel loss function, named Adaptive Wing loss, that is able to adapt its shape to different types of ground truth heatmap pixels. This adaptability penalizes loss more on foreground pixels while less on background pixels. To address the imbalance between foreground and background pixels, we also propose Weighted Loss Map, which assigns high weights on foreground and difficult background pixels to help training process focus more on pixels that are crucial to landmark localization. To further improve face alignment accuracy, we introduce boundary prediction and CoordConv with boundary coordinates. Extensive experiments on different benchmarks, including COFW, 300W and WFLW, show our approach outperforms the state-of-the-art by a significant margin on various evaluation metrics. Besides, the Adaptive Wing loss also helps other heatmap regression tasks.
more » « less
Full Text Available

Search for: All records